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POISSON INVERSE PROBLEMS 1 

By Anestis Antoniadis and Jeremie Bigot 

University Joseph Fourier and University Paul Sabatier 

In this paper we focus on nonparametric estimators in inverse 
problems for Poisson processes involving the use of wavelet decom- 
positions. Adopting an adaptive wavelet Galerkin discretization, we 
find that our method combines the well-known theoretical advan- 
tages of wavelet-vaguelette decompositions for inverse problems in 
terms of optimally adapting to the unknown smoothness of the solu- 
tion, together with the remarkably simple closed-form expressions of 
Galerkin inversion methods. Adapting the results of Barron and Sheu 
[Ann. Statist. 19 (1991) 1347-1369] to the context of log-intensity 
functions approximated by wavelet series with the use of the Kullback- 
Leibler distance between two point processes, we also present an 
asymptotic analysis of convergence rates that justifies our approach. 
In order to shed some light on the theoretical results obtained and to 
examine the accuracy of our estimates in finite samples, we illustrate 
our method by the analysis of some simulated examples. 

1. Introduction. In this article the problem of estimating nonparametri- 
cally the intensity function of an indirectly observed nonhomogeneous Pois- 
son process is considered. Such a problem arises when data (counts) are 
collected according to a Poisson process whose underlying intensity is indi- 
rectly related by a linear operator K to the intensity (the object that we 
wish to estimate) of another Poisson process. This kind of indirect problem 
is referred to as a Poisson inverse problem. In rigorous probabilistic terms, 
let F be a nonobservable Poisson process on a measure space (Eq, B(Eq), /iq) 
and let tf{x) be its intensity function with respect to the measure /io on 
B(Eq); that is, for any set A G B(Eq), the number of points of F lying 
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in A is a random variable F{A) which is Poisson distributed with param- 
eter f A tf(x)dfio(x), and for any finite family of disjoint measurable sets 
A\, . . . , A n of Eq, .F(^4i), . . . , F(A n ) are independent random variables. For 
studying asymptotic properties, the function /, referred to as the scaled in- 
tensity function, is held fixed and the positive real t, referred to as the "ob- 
servation time," increases. The observable data form another Poisson pro- 
cess G on, possibly, another measure space (Ei,B{E\), n\) with an intensity 
function th(y) with respect to a measure The scaled intensity functions 
/ and h, considered as elements of the separable Hilbert spaces L 2 (Eq,/iq) 
and L 2 (E\, hi), are related by an operator equation h = Kf for some lin- 
ear compact operator K mapping L 2 (Eq,^q) into L 2 {E\, fii). Observing the 
point process G must be understood in a measure sense. Assuming therefore 
that for any v £ L 2 (£'i,/xi) we observe / vdG, the natural goal is to estimate 
the scaled intensity /. In many applications, K is an integral operator with 
a kernel representing the response of a measuring device; in the special case 
where this linear device is translation-invariant, K reduces to a convolution 
operator. Examples range from all kinds of image deblurring models, math- 
ematical models for positron emission tomography and nuclear magnetic 
resonance, or unfolding problems in stereology and high-energy physics, to 
cite only a few. Solving such problems, that is, recovering /, is often difficult 
since in cases which are of most interest scientifically, K is not invertible; 
that is, K~ l does not exist as a bounded linear operator so that a small 
perturbation in the data may lead to very different solutions to the recovery 
problem. 

Related problems of inverse estimation for linear inverse problems with ad- 
ditive normal noise have been proposed in the literature, including smooth- 
ing kernel methods [12], smoothing spline methods [21, 22], Gauss-Chebyshev- 
type quadrature methods for solving integral equations [19] and singular 
value decomposition (SVD) methods [11, 23, 25], to cite only a few. Wavelet 
and multiscale analysis regularization methods for inverse problems have 
also recently received considerable attention in the statistics literature, ex- 
ploiting the fact that wavelets provide unconditional bases for a large vari- 
ety of smoothness spaces. Fan and Koo [9] have focused on nonparametric 
deconvolution density estimation based on wavelet techniques. Donoho [7] 
proposed the wavelet-vaguelette decomposition (WVD), which works by 
expanding the function / in a wavelet series XX/> V'j^V'j.fcj constructing 
a corresponding vaguelette series for Kf and then estimating the coeffi- 
cients using a suitable thresholding approach. Donoho showed that a WVD 
is optimal in a minimax sense among all linear and nonlinear estimators for 
inverting certain types of linear operators, including the Radon transform. 
Kolaczyk [15] has numerically investigated the use of a WVD for tomo- 
graphic reconstruction, whereas Abramovich and Silverman [1] have theo- 
retically and numerically studied variants of the WVD. A drawback of these 
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methods is that they are limited to special types of operators K (essentially 
homogeneous operators or convolution-type operators under some additional 
technical assumptions) because one essentially needs to calculate precisely 
the K~ 1 ipj i f c - Cohen, Hoffmann and Reiss [5] have explored the application 
of Galerkin-type methods to white-noise embedded inverse problems, using 
an appropriate but fixed wavelet basis. The underlying intuition is that the 
inversion process required by WVD methods needs only to be accurate to 
a certain error level if the object to be recovered is mostly smooth with 
some singularities, and therefore the inversion can be performed approxi- 
mately using a Galerkin scheme. However, most of the techniques developed 
to date have been designed for Gaussian noise models and are not directly 
applicable in Poisson inverse problems. 

For Poisson inverse problems an alternative approach has been proposed 
by Szkutnik [26] using a quasi-maximum likelihood (QML) histogram sieve 
estimator when restricting h to step functions. Another recent attempt 
for solving related Poisson discrete inverse problems is a Bayesian multi- 
scale framework for Poisson inverse problems proposed by Nowak and Ko- 
laczyk [20], extending their earlier work for problems involving direct Pois- 
son observations (see, e.g., [14, 16]) and based on a multiscale factorization 
of the Poisson likelihood function induced by recursive partitioning of the 
data space. Regularization of the solution is accomplished through usage of 
formal prior probability distributions in a Bayesian paradigm and the solu- 
tion is a maximum a posteriori estimator, computed using the expectation- 
maximization (EM) algorithm. However, the inverse problems addressed by 
the above authors are discrete inverse problems (Poisson sampling from a 
discretized intensity related to a discretized version of the intensity of in- 
terest through multiplication by a matrix of transition probabilities), and 
the question up to which accuracy should the operators be discretized is 
not discussed. Similarly, the work of Cavalier and Koo [3] on hard threshold 
estimators in the tomographic data framework has shown that for a par- 
ticular operator (the Radon transform) an extension of WVD methods for 
Poisson data is theoretically feasible. It is, however, worthwhile pointing out 
that the authors do not provide any computational algorithm for computing 
the estimate and do not address the problem of imposing positivity of the 
estimator since Poisson intensity functions are nonnegative by definition. 

Encouraged by the developments cited above and inspired by the WVD 
methods for solving inverse problems, we explore in the sequel an alterna- 
tive approach via wavelet-based decompositions combined with thresholding 
strategies that address adaptivity issues. Specifically, our framework extends 
the wavelet-Galerkin methods of Cohen, Hoffmann and Reiss [5] to the Pois- 
son setting. Rates of convergence are derived for linear and nonlinear esti- 
mators, in analogy to classical wavelet estimators based on projections and 
thresholding, respectively. In order to ensure the positivity of the estimated 
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intensity, the log- intensity is expanded in a wavelet basis. The derivation 
of our results takes place within an extension of the paradigm developed 
by Barron and Sheu [2] and involves the adaptation of recent techniques 
on concentration inequalities for suprema of integral functionals of Pois- 
son processes which are analogous to Talagrand's inequalities for empirical 
processes. Although there are close similarities between wavelet-Galerkin 
techniques and earlier techniques based on WVD or VWD systems, the use 
of the wavelet-Galerkin machinery allows us to address inversion under a 
broad class of operators (i.e., not just homogeneous operators) and to take 
advantage of certain computational efficiencies. 

The rest of this paper is organized as follows. While the Galerkin ap- 
proach of Cohen, Hoffmann and Reiss [5] is relatively easy to describe when 
the inversion problem is a white-noise embedded problem, this is not the 
case for Poisson inverse problems. After fixing the notation and recalling 
some basic definitions, Section 2 contains an equivalent formulation of the 
wavelet-Galerkin approach for log-intensities that involves a notion of in- 
formation projection similar to the one used by Barron and Sheu [2] for 
estimating a density, which is developed in Section 3. In Section 4 a lin- 
ear estimator for the linear inverse problem at hand is proposed using the 
appropriate type of wavelets adapted to our case. In the spirit of wavelet 
denoising methods [1, 7], and in order to gain in adaptivity, we then im- 
prove, in Section 5, the estimator by applying a soft-threshold nonlinearity 
to the Galerkin- vaguelette coefficients. The last section is devoted to the nu- 
merical implementation of our procedures. We present the results of a small 
Monte Carlo experiment designed to study the finite-sample behavior of our 
estimates. Technical proofs are given in the Appendix. 

2. Preliminaries and notation. In this section we establish the notation 
and the general framework of the models which are adopted in this paper for 
the Poisson inverse problem formulated in the Introduction. For this pur- 
pose let F and G be two Poisson point processes on Borel measurable spaces 
(Eq,B(Eo), Ho) and (E\,B(Ei), m), respectively. Associated with these Pois- 
son processes are the intensity measures defined by 

(a) X F (B) = E(F(B)) = J B tf(x) d^{x),B e B(E ), 

(b) X G (B') = E(G(B')) = f B , th(x) dni(x),B' G B{Ex), 

where t is an "observation time" which will tend to infinity in our asymp- 
totic considerations. Observing the process G, we consider the problem of 
estimating the scaled intensity function /, when the scaled intensity h re- 
sults from the action of a compact self-adjoint positive definite operator 
K : L 2 (Eq, /io) — ► L 2 (Ei,ni) on the intensity function of the process F, that 
is, h = Kf. To simplify the notation, we will assume in the following without 
any loss of generality that the observation and unknown domains Eq and 
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E\ are identical Borel subsets of M. d (d> 1), say E, and that jjlq = fii = \i 
where /i denotes Lebesgue measure. A discussion of how one can handle the 
case Eq / Ei or K not self-adjoint positive definite is deferred to the end of 
this paper. 

In order to estimate the unknown intensity function /, we will approx- 
imate the logarithm of the intensity by a standard wavelet basis function 
expansion. A notable advantage of using such an exponential family intensity 
estimation is that it forces positivity of the resulting estimator, which is not 
shared by other traditional methods of nonparametric intensity estimation 
such as kernel estimators and orthogonal series expansions of the intensity 
rather than the log-intensity. To assess the quality of the estimation, we 
will measure the discrepancy between an estimator ft and the true inten- 
sity function / in the sense of relative entropy (Kullback-Leibler distance) 
between two point processes, 

A(/;A) = |(/log(i)-/ + / t )d/i, 

where the logarithms above are taken with base e. One can show (see 
Lemma VII. 3 of [3]) that the above distance of the two intensities is also the 
Kullback-Leibler distance between the corresponding Poisson processes. It 
is well known that A is nonnegative and equals zero if and only if ft = / a.e. 
The intensity ft in the exponential family that is closest to / in this relative 
entropy sense is the so-called information projection of / [6]. 

The ill-posed nature of the problem comes from the assumption that K is 
compact and therefore its inverse is not L 2 -bounded. As in [5] we will express 
the ill-posed condition of K by a smoothing action: K will map L 2 (E,/i) 
into some smoothness space H r for some r > 0. Following the notation in 
the paper cited above, we will say that K has the smoothing property of 
order v > if K maps the Sobolev space H s onto H s+V or the Besov space 
Bp q onto Bp^ u . Recall that the Sobolev space H S (M), s£R, is the space of 
tempered distributions v such that 

\\vf s = I (i + iei 2 ri^(e)i 2 ^<oo, 

Jr 

where 

f e i!it v{t)dt 
Jm. 

denotes the Fourier transform of v. The Besov spaces form another particular 
family of smoothness spaces. Essentially the Besov spaces B^ q (R d ) consist 
of functions that "have s derivatives in L p " ; the parameter q provides some 
additional fine-tuning to the definition of these spaces. 
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For a self-adjoint positive definite operator K, the smoothing property 
can be expressed by the ellipticity property, 

(2-1) (Kf,f)~\\ff H _ v/2 , 

where (•,•) denotes the standard inner product on L 2 (E,/i) and H~ u is the 
dual space of H v appended with appropriate boundary conditions depending 
on the problem (homogeneous, periodic, etc.) (see [5]). 

As already explained in the Introduction, a key ingredient for solving the 
Poisson inverse problem is the use of standard wavelet bases of L 2 (E,/i) 
which allow the characterization of the function spaces that describe both 
the smoothness of the solution and the smoothing action of the operator 
K, since wavelet bases provide also an unconditional basis for a variety 
of other useful Banach spaces of functions, such as Holder spaces, Sobolev 
spaces and, more generally, Besov spaces. Assume that we have a scaling 
function (ft and a wavelet function if). Scaling and wavelet functions at scale 
j (i.e., resolution level 2 3 ) will be denoted by and where the index 
A summarizes both the usual scale and space parameters j and k [e.g., 
for one-dimensional wavelets, A = (j, k) and tpj ^ = 2^ 2 ip(2 3 ■ —k)]. If d > 2, 
the notation stands for the adaptation of scaling and wavelet functions 
to multidimensional domains. The notation |A| = j will be used to denote 
a wavelet at scale j, while |A| < j denotes some wavelet at scale j' , with 
< j' < j (we shall assume, merely for notational convenience, that the 
usual coarse level of approximation jo is equal to 0). With this notation, we 
assume that: 

(a) The scaling functions (<^_x ) | a|=j span a finite-dimensional space Vj 
within a multiresolution hierarchy Vq C V\ C ■ • ■ C L 2 (E,fi), such that 
dim(V 7 ) = 2 jrf (periodic wavelets for notational convenience). 

(b) We are in the orthonormal case, that is, the scaling functions (0a)|a|=j 
are an orthonormal basis of Vj, and the wavelets (ijj\)\\\=j form an orthonor- 
mal basis of Wj which is the orthogonal complement of Vj into Vj+i. 

(c) For any g G L 2 (E,n), its wavelet decomposition can be written as 

oo 

|A|=0 i>0|A|=j 

where r A = (g, cf>\) and (3 X = (g, ip\). 

(d) To simplify the notation we shall use the convenient slight abuse of 
notation that sweeps up the coarsest-j scaling functions into the tp\ as well, 
that is, we will sometimes write (ip\)\x\=-i f° r (</>a)|a|=o- We thus denote the 
complete <i-dimensional, inhomogeneous wavelet basis by A € A}. By 
truncating the wavelet decomposition at level j, we obtain the orthogonal 
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projection onto Vj, 



Pj9= E 

W<3 



(e) We also assume that ||^a||oc = ||V>||oo2 |A|d/2 . 

Wavelets provide unconditional bases for the Besov spaces, and one can 
express whether or not a function g on E belongs to a Besov space by a 
fairly simple and completely explicit requirement on the absolute value of the 
wavelet coefficients of g. More precisely, let us assume that the original one- 
dimensional (f) and ip are in C L (R), withL > s, that a = s + d(l/2-l/p) > 0, 
and define the norm || • \\ s ,p,q by 

/ oo / \ <j/p\ 1/9 

\\g\\s, P , q = E 2jVp E \Mx)\ p ) 

\j=0\ AgA,| A|=j / / 

Then this norm is equivalent to the traditional Besov norm, that is, there 
exist strictly positive constants A and B such that 

A\\g\\ s ,p, q <\\g\\ B s jq <B\\g\\ S)P)q . 

The condition that a > is imposed to ensure that Bp q (M. d ) is a subspace 
of L 2 (R d ); we shall restrict ourselves to this case in this paper. 

To end this section, and since our estimation procedures will be based on 
a wavelet-Galerkin projection method, we recall here some useful results on 
linear Galerkin projection methods for solving linear problems h = Kf. For 
a more detailed description the reader is referred to the the fairly extensive 
presentation in the paper by Cohen, Hoffmann and Reiss [5]. 

Let / £ L 2 (E,fi); then the function fj G Vj is said to be the Galerkin 
approximation of / if for all v G Vj 

(Kfj,v) = (Kf,v). 

Let Fj G M. 2jd be the vector of wavelet coefficients of fj G Vj] then the 
Galerkin projection method for approximating / amounts to solving the 
linear system 

K jGj = G K , 

where Kj = ((K-ip\,ip K ))\x\<j,\ K \<j is a symmetric positive definite matrix and 
G K = ((Kf, ip K ))\n\<j is a "data" vector. Now, define the Galerkin wavelets 
u J x G Vj as 

(2.2) (Ku{,v) = (ip x ,v) forallwGVj. 
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Let U x be the vector of wavelet coefficients of u x £Vj; then 

where = ((ip\,'^n))\K\<j is a vector with zero entries except for the Ath 
component which is equal to 1. Note that 

(u i x ,Kf) = (U{) T ((^Kf)) lKl<J 
= (U{fG K 

= ^>lK- l G K = ^lG j = G j ^ 

where Gj \ = (fj,ip\) denotes the Ath component of Gj. Hence, if we define 
fj S Vj by (fj,ipx) = (Kfi u \)i then /,• is the Galerkin approximation of /. 

3. Information projection-based estimation. Information projection for 
the estimation of density functions has been studied by Barron and Sheu [2] . 
They obtained various existence results and asymptotic bounds for the dis- 
tance / plog(p/q) between two probability density functions p and q. Their 
estimation procedure is based on sequences of exponential families spanned 
by orthogonal functions such as polynomials, splines and trigonometric se- 
ries. Estimation of density functions by approximation of log-densities with 
wavelets has been considered by Koo and Kim [17]. 

We adapt in this section the results of Barron and Sheu [2] to the context 
of log- intensity functions approximated by wavelet series with the use of the 
Kullback-Leibler distance between two point processes. More precisely, let 
j > 0. If 9 denotes a vector in R 2J , then 6\ denotes its Ath component. The 
wavelet-based exponential family £j at scale j will be defined as the set of 
functions 

£j = |/i,e(0 =exp ( E GxM-^o = (e x ) w<j e R 23d j. 

Following Csiszar [6], the intensity fjfi in the exponential family £j that is 
closest to the true intensity / in the relative entropy sense is characterized 
as the unique intensity function in the family for which (f-g^i^x) = (ZjV'a)- 
It seems therefore natural to estimate the unknown intensity function / by 
searching for some 6t £ M 2J such that 

(f jt § t M = ^ j u{dG = a x for all | A| < j. 

If there exists a solution to this problem, then / . ^ will be called the Galerkin 
information projection estimate of / at scale j, since in the context of 
the wavelet-Galerkin approach for solving y = Kf + adW, the estimation 
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(y,u x ) of (Kf,u J x ) is replaced by j J u J x dG = a\, while (fj,ip\) is replaced 

by (fj&M- 

We already pointed out the advantage of such an approach since one can 
guarantee that the intensity function estimates are positive. The following 
lemma states some of the I-projection properties onto £j (see also [6]). 

Lemma 3.1. Let a G M 2 ^ . Assume that there exists some 9(a) G M 2 ^ 
such that for all |A| < j 

Then, for any intensity function f G L 2 (E, /x) such that (/, i^x) = a\ and for 
all 9 G M. 2jd , the following Pythagorean-like identity holds: 

A(/; ./•„,; = A!/: f jMa) ) + Ai / ; , ):(! : . /,,,). 

A consequence of the above lemma, and since A(/;/t) > unless / = h 
almost everywhere, is that 6(a) (if it exists) uniquely minimizes A(f;fj j g) 
for 6eR 2Jd . 

From now on assume that there exists some constant Aj < oo such that 
for all v G Vj 

\\v\\oo < Aj\\v\\ L 2. 

A key lemma relating distances between the intensities in the parametric 
family to distance between the corresponding wavelet coefficients is then 
the following. 

Lemma 3.2. Let 6q G M? Jd , ao,A = (fj,e ,^x) and a G M? Jd be a given 
vector. Let b = exp(|| logC/^JHoo) and e = exp(l). If \\a - a \\ 2 < ^dxj' 
then the solution 9(a) to 

(fj,e(a),^x) = a\ for all |A| < j 

exists and satisfies 

(3.1) ||W-0o||2<2e&||a-ao|| 2 , 

'fj,9(a ) 



(3.2) 



log 



< 2e6A,-||a — aolb. 



fj,8(a) 

(3-3) A(fj,e( ao yJj,e(a)) < 2e6||a — a ||l- 

The proof of this lemma relies upon a series of lemmas on bounds within 
exponential families for the Kullback-Leibler distance and is given in the 
Appendix. 
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4. Linear estimation. Let M be some fixed constant and let Fp q (M) 
denote the set of scaled intensity functions such that 

F^ q (M) = {f = exp(g), \\g\\ B . q <M}. 

Note that assuming that / G Fp q (M) implies that / is strictly positive. 
For / G Fp q (M), let g = log e ('/) and define 

Dj = \\g-Pjg\\ L 2, 

7j = Il0--Pj0l|oo- 

Basic to our analysis is a decomposition of the relative entropy between 
the true and the estimated intensities into the sum of two terms which 
correspond to approximation error and estimation error (bias and variance in 
a familiar mean squared error analysis). The proof of this result, which relies 
upon some concentration inequalities for Poisson processes, is postponed to 
the Appendix. 

Theorem 4.1. Assume that ip is compactly supported and that f G 
F^ q {M) (with s > d/p > d/2). Let M\ > 1 be a constant such that Mf 1 < 
/ < Mi (see Lemma KA), and let £j = 2Mfe 2 ^ +1 D j A j . If e j < I, the in- 
formation projection exists, that is, there exists 9j G M. 2J such that 

{f 3,0] M = (fM for all\X\< j, 

and the approximation error satisfies 

A(/ : ./;,,,; )<:r' ( -/;j. 

Moreover, suppose that if) is in }{ s + d /' 2 ~ 1 ' (with s > v — d/2) and has r 
vanishing moments with r > s + d/2. Let 5j = 4M 2 e 2ej+2 ' yj+ ' 2 A 2 pj t t, where 
p jt = (2J^+ rf / 2 )/^+2^ + ( 3 / 2 ) d )/i) 2 + 2- 2 -? s . 1/5*. < 1, then for every 7] 2 < i 

i 

there is a set of probability less than exp(— rf), such that outside this set there 
exists some Ot G M. 2jd which satisfies 

(fj&M = ]J u i dG f° r dl l A l < i> 

and the estimation error satisfies 

A(f Jter ,f jA )<Cr, 2 e 1+ ^ Pl , t . 

Note that by using the above theorem, explicit bounds are obtained which 
are applicable for each finite value of j and t, subject to Ej and 5j < 1. We 
can now state the general result on the nonadaptive Galerkin information 
projection estimator of the unknown intensity function. 
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Theorem 4.2. Assume that ift is compactly supported and that f € 
i*2 2 (M) (with s > d/2). Moreover, suppose that if) is in }j s + d / 2 ~ u ( w uh 
s > v — d/2 ) and has r vanishing moments with r>s + d/2. Let j(t) be such 
that 2 _J 'W = (I)V(2s+2W-<i)_ Then, with probability tending to I as t -> oo, 
the Galerkin information projection exists and satisfies 

/ /i \ 2s/(2s+2u+d)\ 

The above estimator therefore converges almost surely with the optimal 
rate for intensities in F^iM). However, the main defect of the estimator 
defined in Theorem 4.2 is that it is suited for smooth functions and does not 
attain the optimal rates when, for example, g = log(/) has singularities. We 
therefore propose in the next section another estimator derived by applying 
an appropriate nonlinear thresholding procedure. 

5. Nonlinear estimation. It is well known that linear estimators do not 
achieve the optimal rates of convergence when the functions to be recovered 
belong to Besov spaces BL 1 with index 1 < p < 2 (the case of functions which 
are not very smooth) . In order to attain such a rate we need therefore some 
kind of nonlinear procedure and this is our aim in this section. 

Our estimation procedure simply consists of applying a soft thresholding 
algorithm on the "data" to which we apply the Galerkin information projec- 
tion inversion which was described previously, exploiting the fact that the 
model with Poisson intensity is not too different from the usual Gaussian 
white-noise model. 

Let us first recall that the coefficients defining the Galerkin information 
projection estimate of / at scale j, as derived in the previous section, are 
given by 

&t,x = ] j u{dG = {U{) T - t ( K j V M dG) <; 

where Jj{ =KJ l ^\. 

For some j >0 (to be fixed further) , we define the thresholded coefficients 
Pt{&t,x) = {U{) T (r e(t) {^J '^dG^j ^ for all |A| < j, 

where T E ^{x) = sign(x)(x — s{t)) + for x G R denotes the usual soft thresh- 
olding operator with threshold e{t). 

In order to build an optimal solution for the Poisson inverse problem we 
will use a level-dependent wavelet thresholding procedure by setting e(t) = 
^-i/22^|A|^| logt|. The role of 2^' A ' is to take into account the amplification 
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of the noise by the inversion process. The following theorem shows that 
the resulting estimator behaves in an optimal way provided that the cutoff 
resolution level j(t) is chosen such that 2 _ - J ^ < t~ l ^ 2u \ where v is the 
degree of ill-posedness of the estimator. 

Theorem 5.1. Assume that tjj is compactly supported and that f G 
Fp p (M) with s > and 1/p = 1/2 + s/(2v + d). Moreover, suppose that 

ij) is in s > v - d/2 ) and has r vanishing moments with 

r > s + d/2. Also assume that K is an isomorphism between L 2 and H u 
and that it has the smoothing property of order u with respect to the space 
Bp p . Then, the above described Galerkin information projection estimator, 

minimax rate 



sa y fj{t),9 t > satis fi es the 



E(A(/;/ j(i)A ))<0(( 7V W) )> 
provided that j(t) is such that 2~ j ® < t' 1 ^ 2 ^. 

Note that the lower bound on j(t) does not depend on the unknown 
smoothness of / and therefore Theorem 5.1 allows us to build an adaptive 
solution to our Poisson inverse problem. The assumption that K~ l maps 
H v into 1? in the above theorem is also implicit in the vaguelette-wavelet 
method of Donoho [7] for white-noise inverse problems. 

6. Implementation and some numerical results. The purpose of this sec- 
tion is to describe the implementation of our approach and to briefly explore 
the performance of our method from a numerical point of view. As in [5] we 
will focus on a simple example of a logarithmic potential kernel in dimen- 
sion 1. We will consider its action on two typical test intensity functions, 
which, together with their folded versions by the action of K, are displayed 
in Figure 1. 

The logarithmic potential operator K that we will consider is defined by 



Kf(x)= I' k(x,y)f(y)dy, 



where 



k(x,y) = _1 °gQ 



o 



. y 

sin — 



x, ye [0,1]. 



,2 

Such a kernel is singular on the diagonal x = y but integrable. The corre- 
sponding operator is known to be an elliptic operator of order —1, which 
maps H~ 1 / 2 into H 1 / 2 and therefore satisfies the assumptions made in this 
paper with v = 1. The first test function we will consider is 

f(x) = max{l - |30(x - 0.5)|,0.1}, x £ [0, 1], 
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Fig. 1. Artificial intensity functions (left) and their folded versions (right) by the action 
of the logarithmic potential kernel. Top-left: the intense peak; bottom-left: the burst-like 
intensity. 



which presents an intense peak and is badly approximated by the singular 
functions of K but has a very sparse representation in a wavelet basis. We 
will also consider a fast rise-exponential decay model, giving rise to the ab- 
breviation "FRED" in the astronomy literature, to model burst phenomena 
which are of the form f(x) = fo + Ya=i fi( x ) where /o models the relatively 
constant background level of gamma-ray photon arrivals and fa models the 
ith. peak in the burst of a form 

m i|/°V,i)' if x < m ii 

rriil/a^i), if x>rrii. 

In the expression above m; denotes the location of the ith peak, and the 
factors a,}, a T: i, a^i and i>i control respectively the amplitude, the rise, the 
decay and the peakedness. 

Most often data can only be observed in binned form because of the 
discrete nature of the measurement apparatus or because binning may be 
enforced by data handling and computing efficiency We therefore have cho- 



fa{x) 



ctiexp(- 
a;exp(- 



\x 
\x 
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sen to discretize K for a maximal resolution level J = 1 1 by computing the 
stiffness matrix Kj with entries 



was computed by Riemann approximation at scale 2 . Note that the kernel 
k is such that k(x,y) = h(x — y) where h(-) is a 1-periodic function. The dis- 
cretization Kj of K is therefore a Toeplitz cyclic matrix and the fast Fourier 
transform makes the computation of the action of K on functions approxi- 
mated in the Haar basis numerically fast and easy. But this is not the only 
reason we have used the Haar-based discretization for both / and Kj. More 
specifically, the Haar scaling basis at resolution J induces a partition of the 
interval [0, 1) into 2 J disjoint and measurable bins B^ = [k2~ J , (k + 1)2~ J ). 
Integrating the function tKjf with respect to the Poisson counting measure 
G simply leads to observed data consisting of counts observed in the bins 
Bf-. By the Poisson nature of G, these are independent Poisson counts with 
expected values within each bin t J B h, k = 0, . . . , 2 J — 1. Moreover, taking 
a high-resolution J permits the approximation of J B h by 2~ J h(k/2 J ) and 
this is what we have done for creating the simulated data in the examples. 

For the examples treated in this paper, the estimation was implemented 
using Symmlets with six vanishing moments. Since the set {x±, . . . ,x n } of 
the n = 2 J points at which the data is sampled is dyadic, any scalar product 
involving a wavelet at a lower resolution is computed via the discrete wavelet 
transform. The information projection estimator was obtained by solving the 
system of equations given in Theorem 5.1 at some maximal resolution J max - 

To find the estimate Ot we have used, inspired by a similar approach 
in [13], a modified version of the Newton-Raphson method. Let S(6) denote 
the J max -dimensional vector of elements 



and H{6) the J max x ^max Hessian matrix whose (A, A') entry is given by 



The method to compute 6t is to start with an initial guess and iteratively 
determine m+1 according to 



where the (f)j t k 
integral 



(Kj)e,k=o,...,2-'-i = ((^J^J/ 5 ^J,fc))^,fc=o,...,2 J -i) 
_ 2 J / 2 /[ fe2 - J ,(fc+i)2- J ) are the Haar scaling functions. Each 




S(0) x = (Pt(&t,\)-(ft,oM) 




e m+l =Q m + H-\e m )S{O m ), 
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with a standard criterion for stopping the iterations. 

One difficulty in implementing the above algorithm is that Symmlets have 
no closed-form functional expression and the integration involved in the 
computation of the Hessian H can be very time consuming if one has to 
compute a table of the appropriate values of the function We have used 
instead an efficient filter-bank algorithm for computing such integrals similar 
to the one used by Vannucci and Corradi [27] or Kovac and Silverman [18] 
to compute the diagonal elements of the covariance structure of the wavelet 
coefficients, which amounts to computing the fast two-dimensional wavelet 
transform of the diagonal matrix whose diagonal is the vector ft,e{ x i), i 
1 Jmax 5 and to retain only the diagonal blocks of the transform. 

For our first example, we consider the peaky function and choose a maxi- 
mal resolution J max = 10 and an "observation time" t = 10 8 (corresponding 
to the noise level used by Cohen, Hoffmann and Reiss [5] for a similar white- 
noise model). A typical sample from the simulated model is shown in the 
top- left panel of Figure 2. 

Let us recall that our nonlinear information projection estimator depends 
on the cutoff level j(t) given by 2- j< $ < (j) 1/(2v) and the level-dependent 
thresholds e(t) = 2 u \ x h~ 1 / 2 ^\ log i | . We therefore have used these expressions 
with v = 1 to estimate the unfolded intensity function. The top-right panel 
of Figure 2 displays the nonlinear Galerkin estimator for the folded Poisson 
data displayed in the top-left panel of Figure 2. We observe, and this is true 
also for the second example, that the peak is very well estimated. However, 
some oscillations are observed on the right side of the central peak. A possible 
remedy to this defect could be to use a translation-invariant procedure, but 
such an approach is beyond the scope of this paper. 

Our second example concerns the burst-like intensity function. The data 
displayed in the bottom-left panel of Figure 2 is simulated as above using for 
an unfolded intensity a burst signal with a constant intensity of 20 assigned 
to the background and three peaks. Since we are using the same logarithmic 
potential kernel to fold the intensity, we have also used here a value of v 
equal to 1. Maximal resolution, smooth cutoff level and thresholds were 
chosen exactly as in the previous example and the estimation procedure 
provides us the fit in the bottom-right panel of Figure 2 for the unfolded 
burst-like intensity, confirming the good behavior of our procedure even for 
complicated intensity structures. 

7. Conclusions. The methodology of this paper was motivated by wavelet- 
vaguelette decomposition (WVD) methods that have been developed in the 
literature for solving inverse problems with Gaussian white-noise perturba- 
tions. Such methods are most appropriate only for the restricted class of 
homogeneous operators, particularly from a computational perspective, and 
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Fig. 2. Simulated Poisson data obtained by folding the intensity functions with the loga- 
rithmic potential. Top-left: the intense peak; bottom-left: the burst-like intensity. The right 
panels display the corresponding unknown intensities (solid) and their nonlinear Garlekin 
estimates (dashed). 

extra theoretical and numerical work is required to handle more general 
operators or Poisson inverse problems. The method developed is particu- 
larly well suited for our Poisson problem because it is designed for positive 
definite operators, its numerical implementation is straightforward, it can 
easily be extended to approximately known operators and it leads to posi- 
tive estimated intensities. It combines the numerical simplicity of Galerkin 
projection methods on a high-dimensional space as an inversion procedure 
and wavelet thresholding as an adaptive smoothing technique. 

Our attention was restricted to inverse problems in which Eq and E\ are 
identical Borel subsets of M. d and the operator K is self-adjoint positive def- 
inite. In the case where Eq ^ E\ and K is not self-adjoint but just injective, 
one may choose, as is done in [5], E\ = K(Eq) and solve instead the in- 
verse problem K*h = K*Kf where K* denotes the adjoint of K. In such a 
way we are led back to the wavelet-Galerkin information projection method 
developed in this paper. 
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Another approach, which is worth investigating in the future, would be 
to derive a Landweber-type iterative algorithm that involves a denoising 
procedure at each iteration step and provides a sequence of approximations 
converging in norm to the maximum penalized likelihood minimizer as is 
done by Figueiredo and Nowak [10], who derive such an iterative algorithm 
for inverting a convolution operator acting on objects that are sparse in the 
wavelet domain. 

APPENDIX: PROOFS OF THE MAIN RESULTS 
We first derive the proof of Lemma 3.1. 

Proof of Lemma 3.1. Note that 
A(/:/,„) IHf.f,,,) I j(-f + f ma) )d^ + j{-f ma) + f h e)d^ 
where 

D(f,fj,o) = J f\og(J-^dii = D{f,f ma) )+D{f mct) ,f j>e ) 

by Lemma 4 of [2], which completes the proof. □ 

To prove Lemma 3.2 we need some preliminary lemmas on bounds within 
exponential families for the Kullback-Leibler distance; their proofs can be 
easily derived from the proofs of Lemmas 1 and 4 of [2] and thus they are 
omitted. 

Lemma A.l. Let f and h be two intensity measures in L 2 (E,fi). Assume 
that log(^) is bounded; then 

A( f;h) > le-IIW*)!!- J f (log (£) ) ' d/x, 



A(f;h)<±e^f/W°° J f(log 



Lemma A. 2. Forj>0, let 9 ,9 eM. 2dj and b = exp(|| log(/ ji0o )|| oo ); then 

'Ju 



log 



<Aj\\e 



^(fj,<h-Jj,e)>^e-^- e He -e\\l 

A(/ ifflb ;/ iltf )<^ll*- tf ll-||flo-fl|||. 
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We can proceed to the proof of Lemma 3.2. 

Proof of Lemma 3.2. This proof is inspired by the proof of Lemma 5 
of [2]. Let 

F(9)=9-a- H(9), 

where H(9) = J fj^(x)dfi(x). If a = cto, then the result is trivial. Now, if 
a^ao, note that for any 9 £ M. 2 ^ 

A (./••„,:/,,/) = a • (0 O -9) + H(9) - H(9 ). 

Hence, 

F{6 ) - F(9) = A(f jt0o] f jt e) - (oo - a) ■ (9 - 9). 
So, by Lemma A. 2 and the Cauchy-Schwarz inequality, we have that 

F(0o) - F{6) > L e -^-o\\oop Q _ e\\ 2 2 - \\ ao - ah\\e - 9\\ 2 . 
This inequality is strict if 8 ^ Oq. For all 6 such that ||0o — Q\\2 = 2e6||ao — a||2, 
F(9 Q ) - F(9) > 2eb\\a - af^e 1 -^^- 01 ^ - 1). 

The right-hand side is positive whenever 2A,-e6||ao — a\\ 2 < 1. Hence, F(0q) > 
F(9) for all 8 such that \\8q — 9\\2 = 2eb\\a^ — a\\2- Consequently, F has an 
extreme point 9* such that \\9q — 9*\\2 < 2eb\\ao — at\\2- The gradient of F at 
9* must satisfy 

(fj,6*,ip\) = a\ for all |A| < j, 

and so 9(a) = 9*. Hence, inequality (3.1) immediately follows. Inequality 
(3.2) follows from Lemma A. 2. Since F(9(a)) > F(8q), we have that 

&(fj,0(a o y,fj,0(a)) < ("o - a) • (9o ~ 9(a)) 

< \\ao - oc\\ 2 \\9 - 9\\ 2 

< 2eb\\a - 

which completes the proof. □ 

To prove the main results of this paper we shall need a series of technical 
lemmas, stated and proved below. Throughout this section, C will denote a 
constant whose value may change from line to line. 

Lemma A. 3. If tp is compactly supported, then 
E ^Va <C2^ 2 ||/^|| 2 . 

|A|=j oo 
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Proof. This lemma immediately follows from the proof of Lemma 1 
of [17]. □ 

The following lemma is similar to Lemma 2 of [17]. 

Lemma A. 4. Assume that ift is compactly supported and that f £ Fp q {M). 
If s > d/p > d/2, then there exists a constant Mi such that 

< — < / < Mi < oo. 



Proof. Let g = log(/) = E~-iE|A|=j/^A- Since \\g\\ B s q < M, 
have 

/ \ Vp 



we 



< M2" JS 



\x\=j > 

where s' = s + d(l/2 - 1/p). If s > d/p > d/2, then 

(A.i) mh<m\\ P <c2-i s '. 

Then, by Lemma A. 3 



Isiloo < £ 

i=-i 



\M=j 



i=o 

oo 

<cj2 2jW2 ~ s ' ] 

j=0 

oo 

< cj2 2 ~ i{s ~ d/p) - 

3=0 

Since s > d/p, Ej*lo 2~^ s ~ d ^ p ^ < oo and therefore there exists some constant 
Mi > 1 such that \\g\loo = ||log/||oo < logMi. □ 

Now we give bounds for Aj, Dj and -fj. 

Lemma A. 5. Assume that ip is compactly supported; then 

Aj = C2 jd / 2 . 

Moreover suppose that f S (M). If s > d/p > d/2, then 

Dj < C2- J ' (s+d(1/2 ~ 1/p)) , 
lj < C2~^ s ~ d / p \ 
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Proof. The result for Aj immediately follows from Lemma A. 3. Note 
that from (A.l), 

D j = Yj H^'lli ^ c Yl 2~ 2j ' (s+<i(1/2 ~ 1/p)) = o(2~ 2j ' (s+d(1/2 ~ 1/p)) ). 
j'>j j'>j 
By definition, 

7; = \\9 - PjOWoc < A j D j < C2-^ s ~ d M, 
which completes the proof. □ 

We may now proceed to the proofs of our main results. 

Proof of Theorem 4.1. 

Approximation error term. Let g = log(/) = X)^L-i J2\x\=j flx^x, an d for 
all |A| < j, let aj t \ = (exp(Pjg),ip\) and oi\ = (f,ijjx)- Observe that the coef- 
ficients (aj,A — «a)> 1^1 < 3-, are the coefficients of the orthogonal projection 
of / — exp(Pjg) onto Vj. Hence by Bessel's inequality 

Hay-alll < \\f - exp(Pjg)\\ 2 L2 . 

Given our assumptions on ijj and /, Lemma A. 4 implies that 

ii ii2^ m f il - eMPjg)) 2 , 

\\otj - a\\2<Mi J - — dfi, 

and so by Lemma 2 of [2], 

\\a 3 - af 2 < M^-P^ J fa _ p.gf dfl < M 2 e 2 73 D 2 

Now, apply Lemma 3.2 with #o,A = Px, ct\ = (f,tpx) f° r au \M < 3 an d b = 
e ||log(ex P (P,g))||oo. Since ||l g(//exp(P j5 ))|| 00 = 7j , we have that 
|| log(exp(P ? g))|| 00 < logAfi + 7j and therefore b < Mie 7j . From Lemma 3.2, 
we have that if Mxe'iDj < 2e ^ A , that is, if Ej < 1, then 9* = 9(a) exists. By 

Lemma 3.1 (Pythagorean-like relationship), we obtain that 

A(/;/ i|fl .)<A(/;exp(P^)). 
Thence, by Lemma A.l, 

A(/; f jfi *) < \e^~ p ^ J f(g - P 3 gf ,1,, 
which completes the proof of the first assertion of the theorem. 
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Estimation error term. Using the above notation and proof, and since 
by assumption Sj < 1, let 9j G M. 2jd be the parameter vector achieving the 
minimum of the relative entropy for intensities in the exponential family. 
For all |A| <j, define a ,A = (f,4>\) = (fj,9*,ip\) and let a t ,\ = jfu{dG. It 

is easy to see that K(6tt t \) = (u J x ,Kf). We now have 
\\a-t -«oll2= ^2 (at,\ ~ ao,\) 2 

(A.2) 



< 2 ( £ (a t , x - (u{,Kf)) 2 + ((u{,Kf) - « ,a) 2 ) . 

V|A|<7 / 



Concerning the second term of the right-hand side of inequality (A.2), using 
expression (2.2), note that 

((u{,Kf) - </, Va» 2 = ((f,Ku{ - ^ A » 2 < \\f\\UK< ~ *x\\h 

= \\f\\hm<\\h + l-2(Ku{^ x }). 

By definition we have that (Ku{,ip\) = = 1. It follows that 

\\M\\b = E E ( Ku i^ 



, ) 2 



Given the assumptions on the wavelet ip and the operator K, u° x belongs to 
H s+d/2-u anc j h ence belongs to H s+d / 2 . Since r > s + d/2, it follows 
from approximation theory that 

and therefore 

(K,k/)-(/,^a)) 2 <ii/iiI 2 2- 2 ^. 

Thence we obtain for the second term of the right-hand side of inequal- 
ity (A.2) 

(A.3) £ ((u{,Kf) - </,Va» 2 < Il/Ill 2 2- 2 ^. 

To control the first term of the right-hand side of (A.2), let Sj = span{n^; |A| < 
j} and set 

X \S j )= J2^t,x-(K,Kf)f= J2 (\ i u{dG-{ulKf)) 2 . 

\m<j \M<r J J 
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Noticing that 



/ E|A|< ^V ( 



{aeM?*;\\(,a x )\x\<jh<l}- 

we can use Corollary 2 of [24] about concentration inequalities for Poisson 
processes to get, for any w > and e > 0, 

(A.4) F{x(Sj) > (1 + e)E(xO%)) + \/l2^ + K ( e )6 oW } < exp(-w), 
where «(e) = 1.25 + 32/e, 

/•E|A|<j( a A^ A ) 2 ^, , 
w = sup / — ^^-5 iif/ d/i 



{aeRJ d ;|ja|| 2 <l} J ^ 

and 

IE|A|<j Q A«Al|oo 



6q = sup 



{aeRJ d ;||a|| 2 <l} 



t 



In what follows we provide precise control of the constants vq and 6o involved 
in inequality (A.4). It is easy to show that 

v < sup / 2^ ° AU A d V" 

1 {am^;\\a\\ 2 <l}J \|A|<j / 

For any vector a G M jd we have 

(A. 5) / ^ a A "A ^ = H aAflA ' / u A n A' dp 

Vl<3 ' |A|<i,|A'|<j 

( A - 6 ) < XI a A a A'||wilU 2 lK'IU 2 - 

|A|<j 

As argued in [5], the ellipticity property (2.1) yields 

||«il|^-„/ 2 < (Ku{,u{) = (ipx,u{) 

< II^aIMKIU 2 = II u aIIl 2 ' 

and the inverse inequality (see [5]) states that H^H^a < 2 u ^ 2 \\u J x \\ H -^/2, 
which implies that (dividing by ||wj|£,a) 

(A.7) KIU 2 < 2 ^ 

Using the above bound in the inequality (A. 6) we finally obtain 



f(j2 axU (] 2 dfJ- < 2 2 ^ f H «aV < 2 j{2u+d) \\a\\ 2 2 . 
\\\\<j J \\\\<j ) 
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It follows that 



vo<\\Kf\\ c 



2 j(2u+d) 

' t ' 



Now, by definition of the constants Aj and since J2\x\<j a x u \ 6 Vj, we have 





<A j 






W<3 


oo 


\*\<3 


I? 



and it follows that, for 1 1 ct 1 1 2 < 1, 



< 2-J ( v + d ) 



L 2 



\M<j 

which combined with the statement of Lemma A. 5 gives 

'2J("+(3/2)«0\ 

~~ * )' 



o 



Now, recall that Var(± / u{dG) = \ J(u{) 2 Kf dp. Hence, using again the 
bound in inequality (A. 7) we have 

(A.8) ]T VarQ J u{dG\ < -JK/W^^, 

which implies that 

E(x 2 (^))<^/Hoo2^ +2 ^. 
Combining (A.8) and (A. 3) yields finally 



N A l<i 



u\dG-(f^ x ) 



<2Q||^/|| 0O 2^ +2 ^ + 



1?' 



-2js 



From the Cauchy-Schwarz inequality and expression (A. 4) with s = w it 
follows that there exists a constant C > 0, such that, for any w > 

P{x 2 (5i) > C(l + w) 2 (2 j ^ +d ^ / s ft + 2^ v+ ^ d) /t) 2 } < exp(-w). 

Combining the above inequalities, and using the fact that / is bounded in 
L 2 , we get finally 



F{||a t - OolU > C(l + w) 2 ^,i} < exp(- 



-w 



It remains to set 77 = (l+io) and recall that p j>t = (2^ v+d ^ /y/i+ 2^"-K 3 / 2 )<2) / 
t)2 + 2 -2j« to get 

P{||a t - aolli > Civ 2 Pj,t} < exp(-7?). 
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Now, applying Lemma 3.2 with 8q = 9j , a = at and b = e S ^' 6 i °° , we have 
that \\log(fjfi»/exp(Pjg))\\ 00 < £j, and so K Mie e J +7 ->. Hence, if rfp^ < 
faSjPA? ' ^ na t is, if 5* < l/i] 2 , then except in the set above, Lemma 3.2 implies 
that §t = 9{&t) £ ^ 2Jd exists and satisfies 

A(/ ilfl .;/ ; . A )<2e6||a-ao||l 

< 2M 1 e 1 +^+V/^, 

which completes the proof. □ 

Proof of Theorem 4.2. From the bounds for Aj, Dj and jj given 
by Lemma A. 5 and since s > d/2, we obtain that jjU) —> as t —> oo and so 
£j(t) = 0(AjAj) = 0(2^(*)( s ~ d / 2 )). Hence, e j{t) -> as t -> oo which implies 
that 5* (t) = 0(2-J'W( 2s - d )). Since e j(t) -> and <J* (t) as t cxd, Theorem 
4.1 implies that 

Hf;fm,o h) )<o(2- 2 ^ s ), 

while for the estimation error, we have that as t — > oo, then with probability 
tending to 1, /.^ ^ exists and by the Borel-Cantelli lemma satisfies 

A(/, (t)iei(() ;/. (t)9 - t )<0(2-^(*>). 

The result now follows from the Pythagorean-like relationship (Lemma 3.1) 

A (^ ; fmfa) = A (^' fm,e* m ) + A (fm,e* m ; /,•(*),&)• □ 

Proof of Theorem 5.1. Note that 

\\s t (a t ) — «o ill = ( s t(&t,\) - (f,4>\)) 2 

< 2( £ (M«t,A) - (ulKf)f + ((u{,Kf) - (fM) 2 ) . 

From the proof of Theorem 4.1 [see (A. 3)], we have (with the assumed 
conditions on ip and K) 

J2(«,Kf)-(f^ x )f<\\f\\l 2 2-^. 

Note that the space „ is continuously embedded in whenever ( < 
s + d/2 — d/p = 2vs/(2v + d). Moreover, since 2v/(2v + d) < 1 and / is 
uniformly bounded, we therefore obtain the estimate 

£ ((u{,Kf) - (/, ^a)) 2 < 0(2-^/(2^)). 

\M<j 
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This gives the optimal order (iy ,s ^ 2s+2u+d ^ provided j(t) is large enough so 
that 2-J'W < (l)(»+d/2)/H2s+2v+d)) _ For t > 1; we haye (1)1/(2*) < 
(i)(v+d/2)/(i/(2s+2i/+d)) } gince s > Q) W ith equality if s = 0. Therefore, tt2T'® < 



(1)1/(2^)^ we obtain 



E 



(A.9) 

|A|<i 

Note also that 



((u j 



A> 



= E 

|A|<7 



A" 



Kf)-(f^ x )Y<0 - 



(u{,Kf)Y 



2s/{2s+2v+d) 



e(t) 



I' 



- / ifip dG J — (/t, tpfj) 



where h = Kf. Since K is an isomorphism between L 2 and H v ', using the 
proof of Theorem 1 of [5] we have that for any ?7 = (ka)|A|<j £ K 2 -^, 



(A.10) 



I a; 



-^III^GHC/I 



E 

|A|<j 



2 2 ^ A l|u A | 2 . 



Hence, it follows that 



( u \, K f)Y 



< 



E 

|A|<j 



6(t) 



dG 



E ( 5 *(«*.a) 

|A|<i 

We remark that r e ( t )(| / i[)\dG) — (h,ipx) is exactly the error when esti- 
mating (h,tp\) by the thresholding procedure on the "data" j J ip\dG. We 
are thus left with finding a threshold e{t) and some appropriate bounds for 
the estimation of h based on the thresholded coefficients T e ^(j J ip\dG), 
|A| < j, which would yield a bound for Ep^aj) — aolli- 

To simplify the notation set j3\ t t = t f ^A dG and let P\,o = / ip\h d[i. Since 
G is a Poisson process with intensity th it is easy to see that E(/3a*) = 
/?a,o and that Var(^A.t) = \ J ^\hd^i = \cr\. In order to bound the intensity 
estimation risk by a corresponding white-noise model risk, we will apply 
Lemma V of [3] to construct an approximation ?)a * having an exact Gaussian 
distribution with the same mean /3\,o an d the same variance Var(/A.,t)- To 
this end, let g\ = ^Va and note that / g 2 hdfi = 1 and ||#a||oo = ^H^aIIoo 



2 |A|rf/2 

recipe. 



""A 



H\, say. We construct fjxt = @\ o + t x l 2 o\Z\ by the following 
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First, if cr A > C2^ dl -^, then use Lemma V of [3] to construct Z\ and 
note that 

2 

V x ,t = nh,t - Vx,t? = ^E(t-^ 2 S x ,t - Z x ) 2 < C2\ x \ d t~\ 

where Sx,t — / ^PxidG — thd/j,). 

Second, if a\ < C2^ x ^ d lo& t - , choose an independent Z\ ~ iV(0, 1) and sim- 
ply use the inequality 

V\,t < 2Var(^ A)t ) +2t-V| < 02^^. 
In either case, we have therefore for all |A| < j and all t > 0, 

V^<C2W*p. 
To apply the Gaussian approximation to T £ r t \(f3\ ; t) note that 

nT £ (t)(kt) ~ Pxfif < 2E(T £(t) x>t ) -T £{t) {^ t )f + 2E(T £{t) (f) x ,t) - Px fl ? ■ 

Since the mapping y — » T(y,e) is a contraction (see [8]) regardless of the 
value of e, it follows that 

HT £ (t)0x,t) ~ Px,of < 2V x ,t + 2r(e(t); t~ 1/2 a x ; &,„), 

where r(e(t); cr; /?) is the Gaussian mean squared error E(T £ (/3 + aZ) — (3) 2 
for estimation of (5 from a single Gaussian observation with mean f3 and 
variance cr 2 . Since all intensities h £ F£ (M) are uniformly bounded, we 
have a\ < II /ill oo an d therefore 



(A.ll) E(r e{t) (/3 Ait )-/3 Ai0 ) 2 <2T/ Ait + 2r( £ (t);t- 1 / 2 ||/ l || oo ;/3 Ai0 ). 

Using the level-dependent threshold e(t) = 2 u \ x h~ l l 2 log t| , the upper 
bound in inequality (A.ll), the fact that h belongs to a Besov ball Bp^{M) 
for some finite constant M and the stability property (A. 10), we obtain the 
rate 

/ \ //I / \2s/(2s+2v+d) 

E^(5 t (a t , x )-(u{,Kf)) 2) j =0^-^logt|J 

as a particular case of classical results on soft wavelet thresholding (e.g., 
see [4]). Combining the above upper bound with inequality (A. 9) and using 
again Lemma 3.2 concludes the proof of the theorem. □ 
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